Basis Function Adaptation in Temporal Difference Reinforcement Learning

نویسندگان

  • Ishai Menache
  • Shie Mannor
  • Nahum Shimkin
چکیده

We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based method and the Cross Entropy method are applied to the basis function adaptation problem. The performance of the proposed algorithms is evaluated and compared using simulation experiments. keywords: Reinforcement Learning, Temporal difference algorithm, Cross Entropy method, Radial Basis functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Temporal Difference Learning in Continuous Time and Space

A continuous-time, continuous-state version of the temporal difference (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobiological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a p...

متن کامل

Automatic speech recognition based on adaptation and clustering using temporal-difference learning

This paper describes a novel approach based on online unsupervised adaptation and clustering using temporal-difference (TD) learning. Temporal-difference learning is a reinforcement learning technique and is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives when interacting with a complex, uncertain environment. The adaptation progres...

متن کامل

Convergence of Reinforcement Learning with General Function Approximators

A key open problem in reinforcement learning is to assure convergence when using a compact hypothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hypothesis class is a linear combination of fixed basis functions, it may diverge with a general (nonlinear) hypothesis class. This paper describes the Bridg...

متن کامل

Model-based reinforcement learning using on-line clustering

A significant issue in representing reinforcement learning agents in Markov decision processes is how to design efficient feature spaces in order to estimate optimal policy. The particular study addresses this challenge by proposing a compact framework that employs an on-line clustering approach for building appropriate basis functions. Also, it performs a stateaction trajectory analysis to gai...

متن کامل

Transfer Learning via Inter-Task Mappings for Temporal Difference Learning

Temporal difference (TD) learning (Sutton and Barto, 1998) has become a popular reinforcement learning technique in recent years. TD methods, relying on function approximators to generalize learning to novel situations, have had some experimental successes and have been shown to exhibit some desirable properties in theory, but the most basic algorithms have often been found slow in practice. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Annals OR

دوره 134  شماره 

صفحات  -

تاریخ انتشار 2005